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ABSTRACT 

Cellular populations are typically heterogenous collections 
of cells at different points in their respective cell cycles, 
each with a cell cycle time that varies from individual to 
individual. As a result, true single-cell behavior, particu- 
larly that which is cell-cycle-dependent, is often obscured in 
population-level (averaged) measurements. We have devel- 
oped a simple deconvolution method that can be used to re- 
move the effects of asynchronous variability from population- 
level time-series data. In this paper, we summarize some 
recent progress in the development and application of our 
approach, and provide technical updates that result in in- 
creased biological fidelity. We also explore several prelimi- 
nary validation results and discuss several ongoing applica- 
tions that highlight the method's usefulness for estimating 
parameters in differential equation models of single-cell gene 
regulation. 
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1. INTRODUCTION 

In the study of any biological phenomenon, it is impor- 
tant to note that differences between data measured at the 
level of cellular populations and those collected using single 
cells can be significant. This is particularly true for pro- 
cesses that are cell-cycle-dependent, due to the fact that at 



any given time individual cells within the population can be 
found at variable points in their respective cell cycles and 
contribute differently to the population average as a result. 
We refer to this kind of variability as asynchronous variabil- 
ity, a 'feature' of the population that exists even in the ab- 
sence of cell-cell communication and similar 'social' effects, 
and is independent of any stochasticity in the observable of 
interest. 

We have developed a simple deconvolution method that 
can be used to remove the effects of asynchronous variabil- 
ity from population-level time-series data [Tl]. An essen- 
tial component of this technique is an accurate model of 
the population asynchrony, that is, the temporal position of 
cells within their cell cycles (their phase) and their distri- 
bution in the population as a function of experiment time. 
The population asynchrony is organism-specific (and possi- 
bly condition-dependent as well), and although it may be 
difficult to establish experimentally, it is in principle char- 
acterizable for any system of interest. 

In our previous work [Tl] we presented an experimentally- 
validated model for the asynchrony in a population of Caulo- 
bacter crescentus cells. Caulobacter is a dimorphic bac- 
terium that has been established in both experimental and 
computational biology communities as an important model 
organism for the study of cell cycle regulation, cellular dif- 
ferentiation, and the mechanisms of bacterial chromosome 
segregation [sj [7| [lO]. The Caulobacter cell cycle, which 
contains both a motile 'swarmer' (SW) stage and a non- 
motile 'stalked' (ST) stage, is shown in Figure [l] By ap- 
plying our deconvolution technique to expression data for 
a set of Caulobacter genes involved in regulating the cell 
cycle, we showed how critical details obscured in the raw 
population-based measurements may be recovered. 

We now extend our deconvolution method and demon- 
strate explicitly, using a 'toy' oscillator described by a set 
of ordinary differential equations, how deconvolution can be 
used to reconcile population-level expression data with sin- 
gle cell-level mathematical models of gene regulation. The 
essentials of the method are summarized below, and we refer 
the reader to [Tl] for further details. 

2. SUMMARY OF THE DECONVOLUTION 
METHOD 

2.1 Cell cycle phase distribution 

We refer to the position of a cell within its own cell cy- 
cle as the its phase (j>, where < < 1. The phase at 
which an individual cell (indexed k) transitions from SW to 



at time t as an integral transform 
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Figure 1: Caulobacter cell cycle shown with phase 
axis. Caulobacter begins its cycle as a motile 
'swarmer' (SW) and differentiates to a non-motile 
'stalked' (ST) state. Division produces two mor- 
phologically distinct cells. 
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, (sst) 



a random variable normally-distributed in the 



population with mean /is3t=0.15 (updated from the previ- 
ous value of 0.25 with new experimental evidence |5]) 
and CV=0.13 [Tl]. At the outset of a typical batch-culture 
Caulobacter experiment, each cell in the population can be 



found with 4>k{Q) < 



(sst) 



As experiment time t passes, 



cells progress through their cycles at a rate that is the in- 
verse of their total cycle times; that is, 4>k{t) = 0fc(O)-f t/Tfc 
for < t < Tk{l — (jjkiO)), where Tk is the total cycle time of 
cell k. When t = Tk{l — (/f>fe(0)) and the cell reaches the end 
of its cycle, two daughter cells — one an SW cell and one an 
ST cell — emerge. This phase evolution model can be simu- 
lated for a large number of cells and accurately predicts the 
time-dependent cell cycle phase distribution of an initially 
all-SW Caulobacter culture (see Section 4.2 below). 



2.2 Average expression in single cells and at 
the population-level 

The signal intensity of a particular species measured in a 
typical RNA expression assay is proportional to the population- 
level concentration G{t) — R{t)/V(t), where R{t) is the 
number of transcripts in the population and V{t) is the total 
cellular volume. For a large number of cells N{t), 



V{t)^N{t) / Q{<t),t) 



and 



Rit)^N{t) / /(0)g(0,t)d0 



(1) 



(2) 



where Q{4>,'t) is the expected value of a single cell's volume 
Vk{<j>) over dk = {</'i'"''\ Tfc}, and /(</>) is the average expres- 
sion of all cells at the exact same phase for a given mRNA 
species, i.e., the fully synchronized average expression. (See 
for the complete derivation.) We may thus write the 
total concentration of RNA transcripts for a particular gene 



G{t) 



m 

V{t) 

J f{(P)Q{4>,t)d(l> 
JW,t) d(t> 

Q{^,t)f(4>)d4> 



(3) 



where Q{(f),t) = Q{(f),t)/ J Q{(l),t)d(l) is the kernel of the 
transform, and has the interpretation of a fractional volume 
density. That is, Q{4),t) represents the fraction of the total 
population volume at time t that exists in (a small interval 
around) phase Because of the complexity introduced by 
cells going through their cycles at different rates (and, in the 
case of Caulobacter, producing new daughter cells at differ- 
ent phases), the deconvolution method relies on simulation 
methods to evaluate Q[cj>,t) and Q{(j>,t). 

2.3 Estimating single cell expressions from 
population data 

Extracting a single cell expression profile from population 
data involves solving the integral equation Q for /{(j)) from 
a limited set of Nm population measurements {G{ti) . . . 
G(tjv,„)}, taken at times ti, . . . ,tNyn- As N^n is finite and 
small, this inversion process is ill-posed and requires a de- 
gree of regularization; that is, the introduction of additional 
information in order to reduce the degrees of freedom in the 
reconstruction process. To this end, we model f{<j}) as a 
natural cubic spline 



(4) 



where {ipii'j})} represent Nc basis functions, each piecewise 
cubic polynomials, and {ai} are unknown coefficients that 
determine the particular shape of the single cell expression. 

An optimization problem is solved in order to find coef- 
ficients that fit the data well while maintaining smoothness 
consistent with natural expressions and not over-fitting the 
measurements. The cost criterion to be optimized is 



(Gitm) - Gitm)f 



+ \ {f"{<j,f}dcj,., (5) 



where G{tm) ~ J Q{(t>,tm)fct{4')d4> is the model-predicted 
measurement and a,n is the variance of the mth measure- 
ment. The second term in ([sjl is a second derivative regular- 
ization function that encourages smoothness in the estimate. 
The parameter A controls the tradeoff between data fidelity 
and smoothness and may be selected via cross validation [5] 

In addition to smoothness, a number of other physically 
based constraints are included to increase the fidelity of the 
model and improve estimation performance. These include 

1. Positivity. As the expression cannot be negative, we 
impose the constraint that fa{4i) cannot be negative. 

2. Continuity. The expression is also constrained by the 
fact that we must have conservation of RNA species 
across cell division. In other words, the concentration 
of any RNA species at phase = 1 must be equal to 



the volume-weighted sum of concentrations at phases 
4> = and (j) = 4>^i^"'^'' for every cell k. This constraint 
can be modeled as J w{(j>)fa{4')d(t> = 0, where the form 
of w{(f)) depends on the particular cell division model 
(see [n]). 

The single cell estimate fa[(j>) is determined by the set of 
a-coefhcients that minimize ||5| while satisfying all of the 
constraints above. 

3. METHOD UPDATES 

In this section we present two refinements to the deconvo- 
lution method presented in [Tl]. The first proposes a more 
realistic cell volume model where changes during growth and 
division occur smoothly. Similarly, the second constrains 
the resultant expression estimate to enforce continuity in 
the rate of transcription generation across cell division. 

3.1 Smoothness of cell volume function 

The integration kernel Q{(j>,t) depends on Q{(j>,t), which 
in turn relies on a cell volume model Vk{4>) describing the 
volume of the fcth cell as a function of its phase and param- 
eters 9k. It has previously been shown that, on cell division, 
the Caulobacter cell volume is partitioned 40% SW to 60% 
ST [12]. This implies that 



Wfc(O) = 0.4 Vo 



(sst)\ 



0.6 Vo 



(6) 
(7) 

(8) 



where Vo is the cell volume a.t cj> = 1, just prior to division. 
Here we assume that the variance of the final cell size dis- 
tribution is small so that Vb is effectively constant across all 
cells k. Further, the rate of volume change of the two cell 
halves should be continuous across division 



(9) 
(10) 

that is, both the swarmer cell and stalked cell should start 
with the same rate of volume change as the cell just prior 
to division. We adopt a simple piecewise polynomial model 
that satisfies ([6|-|To| and is consistent with known biology: 

«fc(0) = 



Vo X <^ 



r 0.44- 

+ 
+ 

1 



1-4, 
0.6-1. 
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(11) 
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The second piece is linear, and the first piece is approxi- 
mately linear with small quadratic and cubic terms. This 
model extends the purely linear model in 11 which did not 



consider the volume rate-of-change constraints. 

3.2 Smoothness of gene expression 

In addition to the continuity constraint described in Sec- 
tion |2.3| we also expect the rate of change of number of 
RNA transcripts to also be continuous across cell division. 
Intuitively, cells should not, for example, have net tran- 
script creation at a given phase and instantaneously switch 



to net transcript degradation. Mathematically, if Rk{4>) = 
Vk{4')fki4>) represents the number of transcripts from cell fe, 
the constraint may be written in terms of the derivative with 
respect to cell phase, R'^il) = Ri{0) + R'^ {(!>[' Using the 
chain rule for differentiation along with the properties and 
form of the cell volume function in the previous section, the 
constraint may be written 



(12) 



where /?(4"*') = "fc(l)/Vb = 0.4/(1 - ^i'"'). Taking the 
synchronous average, denoted {•), of ( |12[ ) over a large num- 
ber A*' of cells yields 



(/3(0i^"'))/(o) - mr')h 



Jsst) 



(13) 



O.4/'(O) + O.6(/a0i^^'^))-/'(l) 



where f{4>) = {fk{4>)} is the synchronous average expression. 
Noting that the variability between the fk is independent of 
0^""''' , the averages in (131 simplify to 



/3(<^)p(<^) # 



= /3o 



(14) 



iV 



/'(0)p(0)d0, 



(15) 



and 



{/?(<^r')/^(0i=»'))) = l^/3(,^(-*))/,(0(-')) 



sst) \ 



N 

k 

« j Pi<p).n^)pi^)d^, (16) 

where p{cj>) = JV{4>; fisBt,o-sst) is the Gaussian probability 
density function of 0^'"'*'. S 
provides the final constraint 

J w^{^)f{<j>)d<l> 

where 



Substituting (|14}-(|16|) into (|13l 

W2{^)f'{<j>)d^, (17) 



«;i(0) = poS{l -<P)- Pom - PWpW (18) 
«;2(0) = 0.45((^) + 0.6p((^) - 5(1 - 0), (19) 

and 5{(f>) is the dirac delta function. 




Figure 2: 'True' synchronized single cell simulations 
of a simple biological oscillator model, compared to 
the resulting population and deconvolved expres- 
sions. 

4. VALIDATION & APPLICATIONS 

4.1 Biological oscillator model deconvolution 

In this section we explore the performance of the decon- 
volution process by testing it on simulated population data, 
where the 'true' synchronized cell behavior is known. Here a 
particular model of a cell-cycle regulated expression in single 
cells is passed through the forward model using the kernel 
function Q{(f>, t) in order to generate simulated population- 
level data. 

As an example differential equation system, we consider 
the classical Lotka-Volterra model |8j as a biological oscil- 
lator. Although originally developed to represent predator 
prey dynamics, the Lotka-Volterra equations have also been 
applied in a regulatory network context (see, e.g., [I]), mak- 
ing them a useful test case. The model equations are given 
by: 

xi_ = xi{a — hx2) (20) 
X2 = X2{cxi — d) (21) 

Where we take x\ and X2 to represent two chemical species 
which bind and convert X\ to X2- 

We chose parameter values which yield a 150 minute pe- 
riod oscillation (similar to the average cell cycle time for 
Caulobacter) , and convolved the resulting simulations with 
Q{(j},t) to yield a noiseless asynchronous population. We 
then added a several of levels and types of noise to the pop- 
ulation data, and deconvolved the simulated population data 
to test how well the method recovers the known 'single-cell'- 
like behavior. 



Figure 3: 'True' synchronized single cell simulations 
of a simple biological oscillator model, compared to 
one realization of the resulting population and de- 
convolved expressions, where Gaussian error with 
standard deviations equal to 10% of the data mag- 
nitude have been added to the population data. 

Figures [2] and [3] show two examples from these investi- 
gations for the noiseless case and a case where Gaussian 
distributed errors with mean zero and standard deviation 
equal to 10% of the data magnitude have been added to the 
population data. The deconvolution generally performs well 
at recovering the major features of the synchronous cell be- 
havior, and these results suggest that this method may be 
a useful tool for parameter estimation for models developed 
to represent gene regulation in individual cells. 

4.2 Distribution of cell types in Caulobacter 

Using the cell- type distribution model presented here, we 
can simulate the distribution of cell types over time in a 
typical batch-culture Caulobacter experiment, and compare 
this to experimental data on cell type distributions. Sim- 
ulated cells were grouped based on their cell cycle phase 
into swarmer (SW) and stalked (ST), with the ST cells split 
into early stalked (ST_e), early predivisional {ST epd), and 
late predivisional {ST lpd)- We assume a mean SW-ST^ 
transition phase of 0.15 as discussed in Section [2. 1| For the 
ST_B-ST_Bp_D and STepd-STlpd transition phases, we note 
that distinguishing between ST_b and ST epd, and ST_bpd 
and ST_LPD morphologies can be difficult experimentally, so 
we use a range of 0.6-0.7 for ST^-ST^pd and a range of 
0.85-0.9 for STepd-STlpd, based on [4][ll]. The resulting 
time dependent cell type distributions, compared to experi- 
mental data from Judd et al. [2] , are shown in Figure m Our 
cell-type distribution model predicts highly similar distribu- 
tions of each cell type, providing additional experimental 
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Figure 4: The simulated distribution (top panel) of 
a batch-culture population of Caulobacter matches 
the experimentally-observed distribution (bottom 
panel). Experimental data is reproduced from Judd 
et al. 4 . Shaded regions in the simulated distribu- 
tion indicate a range of transition phases, with the 
solid line indicating the midpoint. 



validation support for our cell type model. 

4.3 ftsZ expression in Caulobacter 

To highlight the usefulness of our method with a concrete 
example, we show here the population-level and deconvolved 
data for the Caulobacter ftsZ gene. (Population data was 
taken from ^.) FtsZ is a tubulin homolog essential for bac- 
terial cell division that is produced only after DNA replica- 
tion begins at the SW-to-ST transition [g]. This delay in 
ftsZ transcription cannot be seen in the microarray data, 
but is visible in the deconvolved expression profile (Figure 
[5|, providing validation for the deconvolution method. The 
deconvolution makes an additional new prediction of a large 
drop in the level of ftsZ with no subsequent increase af- 
ter transcript concentration reaches its maximal value at 
(j> ~ 0.4. This deviates significantly from the raw microar- 
ray data, which shows levels of ftsZ increasing towards the 
end of the experiment. 
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Figure 5: Population vs. deconvolved ftsZ expres- 
sion in Caulobacter. The deconvolved data (bottom 
panel) resolves the ftsZ transcription delay (indi- 
cated by the arrow) not observed in the population 
data (top panel). Simulated time is a scaling of the 
cell cycle phase to the average cell cycle time of 150 
minutes. 



an additional smoothness condition, and an updated cell 
volume function. We also present new validation and ap- 
plications using experimental data as well as a differential 
equation model of a simple biological oscillator. We are cur- 
rently extending this work to explore the applications of this 
method in estimating parameters for differential equation 
models of gene regulatory networks, which are typically built 
to model single cell behavior but fitted to population data. 
Our ongoing results suggest that the deconvolution tech- 
nique and asynchronous cell population model yield more 
accurate single cell parameters than fitting to population 
data alone. 
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5. CONCLUSIONS & ONGOING WORK 

To conclude, we have developed a method for using decon- 
volution to synchronize populations of cells in silico, which 
yields expression information that is more representative of 
the 'true' single-cell expression. Here, we presented three up- 
dates to this method: an updated SW-ST transition phase. 
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